Identification of Telugu, Devanagari and English Scripts Using Discriminating Features

نویسنده

  • M C Padma
چکیده

In a multi-script multi-lingual environment, a document may contain text lines in more than one script/language forms. It is necessary to identify different script regions of the document in order to feed the document to the OCRs of individual language. With this context, this paper proposes to develop a model to identify and separate text lines of Telugu, Devanagari and English scripts from a printed trilingual document. The proposed method uses the distinct features extracted from the top and bottom profiles of the printed text lines. Experimentation conducted involved 1500 text lines for learning and 900 text lines for testing. The performance has turned out to be 99.67%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proposal on Handling Reph in Gurmukhi and Telugu Scripts

Chapter 9 of the Unicode standard [1] describes the representational model for encoding Indic scripts. Devanagari is described in Section 9.1; the principles of Indic scripts are covered in some detail in the introduction to Devanagari. The descriptions of the remaining Indic scripts were abbreviated highlighting any di erences from Devanagari where appropriate. Some of the problems in this des...

متن کامل

Kannada, Telugu and Devanagari Handwritten Numeral Recognition with Probabilistic Neural Network: A Script Independent Approach

In this paper a script independent automatic numeral recognition system is proposed. A single algorithm is proposed for recognition of Kannada, Telugu and Devanagari handwritten numerals. In general the number of classes for numeral recognition system for a scripts/language is 10. Here, three scripts are considered for numeral recognition forming 30 classes. In the proposed method 30 classes ha...

متن کامل

Wavelet Packet Based Texture Features for Automatic Script Identification

In a multi script environment, an archive of documents printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify the script type of the document. In this paper, a novel texture-based approach is presented to identify the script type of the collection of documents printed in ten Indian scripts ...

متن کامل

A survey on optical character recognition for Bangla and Devanagari scripts

Abstract. The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on ...

متن کامل

A Survey of Feature Extraction and Classification Techniques Used In Character Recognition for Indian Scripts

The Constitution of India, under its Eight Schedule, has recognized Hindi (in Devanagari Script) and English as Official languages of Union Government, along with other 22 languages as Scheduled languages and given status and official encouragement to these Scheduled Languages. Most of the Optical recognition research work has been done on Devanagari, Telugu, and Bangla scripts etc. D e v e l o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009